Integrating active learning and crowdsourcing into large-scale supervised landcover mapping algorithms
نویسندگان
چکیده
Sub-Saharan Africa and other developing regions of the world are dominated by smallholder farms, which are characterized by small, heterogeneous, and often indistinct field patterns. In previous work, we developed an algorithm for mapping both smallholder and commercial agricultural fields that includes efficient extraction of a vast set of simple, highly correlated, and interdependent features, followed by a random forest classifier. In this paper, we demonstrated how active learning can be incorporated in the algorithm to create smaller, more efficient training data sets, which reduced computational resources, minimized the need for humans to hand-label data, and boosted performance. We designed a patch-based uncertainty metric to drive the active learning framework, based on the regular grid of a crowdsourcing platform, and demonstrated how subject matter experts can be replaced with fleets of crowdsourcing workers. Our active learning algorithm achieved similar performance as an algorithm trained with randomly selected data, but with 62% less data samples.
منابع مشابه
Wrapper Generation Supervised by a Noisy Crowd
We present solutions based on crowdsourcing platforms to support large-scale production of accurate wrappers around data-intensive websites. Our approach is based on supervised wrapper induction algorithms which demand the burden of generating the training data to the workers of a crowdsourcing platform. Workers are paid for answering simple membership queries chosen by the system. We present t...
متن کاملThe Interaction Between Supervised Learning and Crowdsourcing
In this paper we report insights on combining supervised learning methods and crowdsourcing to annotate the sentiment of a large number of economic news articles. The application entailed using annotations from a group of non-expert annotators on a small subset of articles to train a classifier that would annotate a large corpus of articles. This presents an active learning problem where the ch...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملReactive Learning: Actively Trading Off Larger Noisier Training Sets Against Smaller Cleaner Ones
One of the most popular uses of crowdsourcing is to provide training data for supervised machine learning algorithms. Because of imperfect workers, requesters commonly ask multiple workers to redundantly label each example. When the goal is to train the best classifier at the lowest cost, active learning can intelligently pick new examples to label. However, active learning fails to address a f...
متن کاملActive Learning with Amazon Mechanical Turk
Supervised classification needs large amounts of annotated training data that is expensive to create. Two approaches that reduce the cost of annotation are active learning and crowdsourcing. However, these two approaches have not been combined successfully to date. We evaluate the utility of active learning in crowdsourcing on two tasks, named entity recognition and sentiment detection, and sho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PeerJ PrePrints
دوره 5 شماره
صفحات -
تاریخ انتشار 2017